AFTER FINAL EXPEDITED PROCEDURE 



Appl. No. 10/061,384 

Amdt» dated November 28, 2 005 

Reply to Office Action of September 9, 2 005 

This listing of claims replaces all prior versions, and 
listings of claims in the instant application: 

Listing of Claims; 

1. (Currently Amended) A method of optimizing 
instructions included in a program being executed, the method 
comprising : 

collecting information describing a frequency of 
occurrence of a plurality of cache misses caused by at least 
one instruction; 

identifying a performance degrading instruction; 

optimizing the program during runtime to provide an 
optimized sequence of instructions, the optimized sequence of 
instructions comprising at least one prefetch instruction; and 

modifying the program being executed to include the 
opt imi zed sequence . 



2. (Original) The method of claim 1, wherein the program 
comprises a plurality of sequence of instructions. 



3.. (Original) The method of claim 1, wherein the 
performance degrading instruction contributes to highest 
frequency of occurrence of the plurality cache misses. 



4. (Original) The method of claim 1, wherein the 
performance degrading instruction contributes to highest 
degradation in the program performance. 
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5. (Original) The method of claim 1, wherein the at 
least one instruction is the performance degrading instruction. 



6. (Original) The method of claim 1, wherein optimizing 
the program comprises inserting the at least one prefetch 
instruction prior to the performance degrading instruction. 



7. (Original) The Method of claims 1, wherein the 
plurality cache misses are LLL2/L3 cache misses. 



8. (Original) The method of claim 1, wherein the 
optimized sequence is prepared while the program is placed in a 
suspend mode . 



9. (Original) The method of claim 8, wherein modifying 
the program comprises: 

changing the program from the suspend mode to the 
execution mode. 
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10. (Original) The method of claim 1, wherein optimizing 
the program comprises: 

receiving information describing a dependency graph for 
the at least one instruction; 

determining whether a cyclic dependency pattern exists in 
the dependency graph; 

if the cyclic dependency pattern exists then, computing 
stride information derived from the cyclic dependency pattern; 
and 
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inserting the prefetch instruction derived from the stride 
information, the prefetch instruction being inserted into the 
program prior to the performance degrading instruction. 

11. (Original) The method of claim 10, wherein the 
dependency graph is a backward slice from the performance 
degrading instruction . 



12. (Original) The method of claim 1, wherein modifying 
the program comprises: 

storing the optimized sequence; 

redirecting a sequence of instructions having the 
performance degrading instruction to include the optimized 
sequence . 



13. (Original) A method of optimizing a program comprising 
a plurality of execution paths, the method comprising: 

collecting information describing a plurality of 
occurrences of a plurality of cache miss events during a 
runtime mode of the program; 

identifying a performance degrading execution path in the 
program ; 

modifying the performance degrading execution path to 
define an optimized execution path, the optimized execution 
path comprising at least one prefetch instruction; 
storing the optimized execution path; and 
redirecting the performance degrading execution path in 
the program to include the optimized execution path. 
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14. (Original) The method of claim 13, wherein the 
plurality of cache miss events are caused by an execution of a 
plurality of performance degrading instructions. 



15. (Original) The method of claim 13, wherein 
identifying the performance degrading path comprises 
identifying a performance degrading instruction contributing to 
highest plurality of occurrences of cache miss events. 



16. (Original) The method of claim 13, wherein the 
optimized execution path is defined while placing the program 
in a suspend mode from the runtime mode. 



17. (Original) The method of claim 16, wherein the 
optimized execution path is executed on resuming the runtime 
mode of the program code from the suspend mode . 



18. (Original) The method- of claim 16, wherein 
redirecting the performance degrading execution path comprises; 

changing the program mode from the suspend mode to the 
execution mode. 



19. (Original) The method of claim 13, wherein the 
performance degrading execution path comprises a performance 
degrading instruction causing the cache miss event. 
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20. (Original) The method of claim 19, wherein the at 
least one prefetch instruction is inserted prior to the 
performance degrading instruction. 

21. (Original) The method of claim 13, wherein 
identifying the performance degrading execution path comprises 
determining whether a cache miss event of the plurality of 
cache miss events is an L2/L3 cache miss. 

22. (Original) The method of claim 13, wherein 
identifying the performance degrading path comprises 
identifying a performance degrading instruction contributing to 
highest degradation in the program performance. 
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23. (Original) The method of claim 13, wherein modifying 
the performance degrading execution path comprises: 

receiving information describing a dependency graph for a 
program degrading instruction contributing to highest 
occurrence of the plurality of cache miss events, the 
performance degrading instruction being included in the 
performance degrading execution path; 

determining whether a cyclic dependency pattern exists in 
the graph; 

if the cyclic dependency pattern exists then, computing 
stride information derived from the cyclic dependency pattern; 
and 

inserting the at least one prefetch instruction derived 
from the stride information, the at least one prefetch 
instruction being inserted into the optimized execution path 
prior to the performance degrading instruction. 
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24. (Original) The method of claim 23, wherein the 
dependency graph is a backward slice from the performance 
degrading instruction. 



25. (Original) A method of optimizing a program, the 
method comprising: 

receiving information describing a dependency graph for an 
instruction causing frequent cache misses, the instruction 
being included in the program; 

determining whether a cyclic dependency pattern exists in 
the graph; 

if the cyclic dependency pattern exists then, computing 
stride information derived from the cyclic dependency pattern; 

inserting an at least one prefetch instruction derived 
from the stride information, the at least one prefetch 
instruction being inserted into the program prior to the 
instruction causing the frequent cache misses; 

reusing the at least one prefetch instruction in the 
program for reducing subsequent cache misses; and 

performing said receiving, said determining, said 
computing, said inserting and said reusing during runtime of 
the program. 
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26. (Currently Amended) A volatile or non-volatile 
computer- readable medium having a computer program accessible 
therefrom, wherein the computer program comprises instructions 
for : 

collecting information describing a frequency of 
occurrence of a plurality of cache misses caused by at least 
one instruction; 

identifying a performance degrading instruction; 
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optimizing the computer program to provide an optimized 
sequence of instructions, the optimized sequence of 
instructions comprising at least one prefetch instruction; and 

modifying the computer program being executed to include 
the optimized sequence during runtime . 



27. (Currently Amended) A computer system comprising: 
a processor; 

a memory coupled to the processor; 

a program comprising instructions, the program being 
stored in memory, the processor executing instructions to: 

collect information describing a frequency of occurrence 
of a plurality of cache misses caused by at least one 
instruction; 

identify a performance degrading instruction; 

optimize the program to provide an optimized sequence of 
instructions, the optimized sequence of instructions comprising 
at least one prefetch instruction; and 

modify the program being executed to include the optimized 
sequence during runtime . 
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